ML crash course - Logistic regression
Machine learning crash course 중 Logistic regression 챕터.
developers.google.com/machine-learning/crash-course/logistic-regression
Introduction
Learning Objectives:
- Identify use cases for performing logistic regression.
- Explain how logistic regression models use the sigmoid function to calculate probability.
- Compare linear regression and logistic regression.
- Explain why logistic regression uses log loss instead of squared loss.
- Explain the importance of regularization when training logistic regression models.
Prerequisites:
Calculating a probability with the sigmoid function
This module focuses on using logistic regression model output as-is. In the Classification module, you’ll learn how to convert this output into a binary category.
Sigmoid function
The standard logistic function, also known as the sigmoid function (sigmoid means “s-shaped”), has the formula:
Linear regression from/to logistic regression
You can pass the linear regression prediction into the sigmoid function to obtain the logistic regression prediction.
Output of linear regression is referred to as the log odds because if you solve the sigmoid function for , then is defined as the log of the ratio of the probabilities of two possible outcompues: and :
Key terms
Loss and regularization
Logistic regression models are Training|trained using the same process as linear regression models, with two key distinctions:
- Logistic regression models use log loss as the loss function instead of squared loss.
- Applying regularization is critical to prevent overfitting.
Log Loss
Squared loss works well for a linear regression where the rate of change of the output values is constant. However, the rate of change of a logistic regression model is not constant.
If you used squared loss to calculate errors for the sigmoid function, as the output got closer and closer to 0 and 1, you would need more memory to preserve the precision needed to track these values.
Instead, the loss function for logistic regression is log loss. The Log Loss equation returns the logarithm of the magnitude of the change, rather than just the distance from data to prediction.
Regularization in logistic regression
Regularization, a mechanism for penalizing model complexity during training, is extremely important in logistic regression modeling. Without regularization, the asymptotic nature of logistic regression would keep driving loss towards 0 in cases where the model has a large number of features. Consequently, most logistic regression models use one of the following two strategies to decrease model complexity:
See also ML crash course - Datasets, generalization, and overfitting
Key terms
- Gradient descent
- Linear regression
- Log loss
- Logistic regression
- Loss function
- Overfitting
- Regularization
- Squared loss